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A mefhod of synfii^ fbr a steady Bound signal 



The present invention rdates to fiie field of sjntiiesizing of speech or music, 
and more particalarlywiliout limitation, to Uie field of text-to-speeoh synthesis. 

The fimotian of a text-to-speech (TTS) syndesis system is to synthesize 
speech 6oniageneriotBxt in a given language. Nowadays, TTS systems have been put into 
5 practical opetatian for many applications, such as access to databases fhrongh the telephone 
network or aid to handicapped people. One method to syattiesize speech is by concatenating 
elements of a recorded set of subunits of speech such as demisyllables or polyphones. The 
majority of successfol comniercial systems employ the concatenation of polyphones. The 
polyphones comprise groups of two (diphcnes). three (Mphones) or more phones and may be 

10 detemained from nonsense words, by segmenting the deahred grouping of phones at stable 
spectral regions, in a concatenation based synthesis, ttie conversation of ttie transition 
between tivo adjacent phones is crucial to assure the quality of flie synthesized speech. With 
the choice of polyphones as the basic subunits, the transition between two adjacent phones is 
preserved in the recorded subunits. and fhe concatenation is carried out between similar 

15 phones. 

Before the synthesis, however, the phones must have their duration and pitch 
modified in order to fulfil the prosodic constraints of the new words containing those phones. 
This processing is necessary to avoid the production of a monotonous sounding synthesized 
speech. In a TTS system, a prosodic module performs this fimction. To allow the duration 

20 and pitch modifications in the recorded subunits, many concatenation based TTS systems 
employ the thne-domain pitoh-synchionous overlap-add (TD-PSOLA) (E. Moulmes and F. 
Chatpentier, "Pitch synchronous wavefomi processing techniques for text-to-speech 
synthesis using diphones," Speech qommun.,-vol. 9, pp. 453-467, 1990) model of synthesis. 
When the signal to be synthesized is required to have an extended duration this is 

2S accompUshed by repeating the pitch bells, which have been obtained from the original signal. 
This repetition process Is illustrated in Kg. i, Time axis 100 belongs to the time domain of 
the original signal. The original signal has a length of T spanning tfae time interval between 
and T on the time axis 100. Further, the original signal has a fundamental frequency f. 
which corresponds to a period p; pitch bells are obtained from the original signal by 
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windowiog the origin^ signal by meaxis of windows 102. £[i tlie example considered here fhe 
windows are spaced apart by the period p in &e domain of tinie axis 100. This way fhe pitch 
bell locations i are determined on time axis 100. 'Hme axi$ 104 belongs to the time domain of 
the £dgnal to be synthesized The signal to be synthesized is required to have a duration of yT, 
5 where y can be any number. Next a number of pitch bell locations j is detenxiined on the time 
axis 104. Like on the time axis 100, the pitch bell locations j are spaced apart by the period p 
corresponding to the fundamental frequency f of the original signaL Jn order to mcrease the 
duration of the original signal each of the original pitch bells obtained from the orighial 
signal is repeated a number of y.times. This results in a number of intervals 106, lOS, . in 

10 the domain of time axis 104, whereby each of ttie intervals 106, 108> , , . is composed of 
repetitions of identical pitch bells, For example the interval 106 contains r^etitions of the 
pitch bell obtained from the pitch bell location i ^ 1 from tbe original signal at pitch bell 
locations j (i = 1, k - 1) to j (i « 1, k y). This means that interval 106 contains a number of 
y repetitions of fhe pitch bell obtained fi:ora pitch bell location i-1 on time axis 100 of the 

15 original signal. Likewise flie following interval 108 contains a number of y repetitions of the 
pitch bell obtained from pitch bell location i=2 from the original slgnaL As a consequence the 
synthesized signal is composed of concatenated sequences of pitch bell repetitions. 

A common disadvantage of such PSOLA methods is that an extreme duration 
manipulation introduces audible transitions between the sequences into the signal. In 

20 particular this js a problem when fhe original sound is a hybrid sound like voiced fricatives 
having both a noisy and a periodic component The repetition of pitch bells introduces 
periodicity in the noisy components, which makes the synthesized signal sound unnatural. 

The present invention therefore aims to provide an improved method of 
synttiesi2dng a sound aignal;, in particnlar for extreme duration modifications, like for singing. 

25 The present invention provides for a method of synthesizing a sound signal 

^based-on-anroriipnaVsignaUn-^rdet^to-matdpiJat^ 

particulaTj the present invention enables extreme duration and pitch modifications of fhe 
origmal signal without audible artefacts. This is especially useful for synthesizing of singing 
where extreme duration manipulations m the order of 4 to 1 00 times of the original signal can 

30 occur. 

In essence, the present invention i$ based on the observation that prior art 
PSOLA methods introduce artefacts hito a synthesized signal after duration manipulation 
because the transition from one chain of repeatmg pitch bells to the next is audible. This 
effect which is experienced when a prior art PSOLA type method is employed for extreme 
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duiatioii inTOipTilatioiis is partioulariy detrimental forliybrid sounds containing both a noisy 
and aperiodic component 

in acooidance with the invention, pitcli bells are randomly selected from the 
original signal tor each of me required pitch bell locations of the signal to be synthesized 
This way the introduction of periodicity in the noisy components can be avoided and the 
naturafaess of liie origmal sound is preserved. la accordance with a preferred embodiment of 
the inveation die original sound is a voiced fiicative having both a noisy and a periodic 
oonq)onent- Application of die present invention to such voiced fricatives is especially 
beneficiaL ^ « • 

In accordance with a further preferred embodiment of the invention a raised 
cosine is used for windowing of voiced fricatives. For unvoiced sound intervals a sine 
Window is used which has the advantage that the total signal envelope in power domain 
remains about constant Unlike a periodic signal, when two noise samples are added, tlie total 
sum can be smaller than the absolute value of any of the two samples. This is because the 
signals are (mostly) not in-phase; the sine window adjusts for this effect and removes the 
envelope-modulation. 

In accordance with a fUrther prefeired embodiment of the invention the 
original sound signal has periods which are spectrally alike and which have basically the 
same infomiation content. Such periods, which are voiced, are classified by a first classifier 
and such periods which are unvoiced are classified by means of a second classifier. 

In accordance with a further preferred embodiment of the iavention the 
classification information of the original signal is stored in a computer systems such as a text- 
to-speech syst^. Intervals of the original signal which are classified as voiced or unvoiced 
steady periods being spectrally alike are processed in accordance with the present invention 
whereby a raised cosine wiudow is used for voiced intervals and a sine window is used for 
unvoiced intervals. 
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In the following preferred embodiments of the invention are described in 

greater detail by making reference to the drawings in which: 

Fig. 1 is illustrative of a prior art PSOLA-type method. 

Kg. 2 is illustrative of an example for synthesizing a sound signal in 

aiscoidance with an embodiment of the present invention. 
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Fig. 3 is iUustrative of a flow chart of an embodiment of a method of the 
present invention. 

Big. 4 shows m eortample of an oiighal signal and of the synthesized signal, 

and 

S Fig. S is a block diagram of a preferred embodiment of a computer system 



Fig. 2 shows an example of synthesizing a signal based on an original signal. 
..Time axis 200 is iltostrativp of the time domain of thei>riginal signal. The original signal has 
10 a duration T and spans the time between zero and T on time axis 200. The original signal has 
a fundamental frequency f which corresponds to a period p. The period p determines 
locations i on time axis 200 for windowing of the original signal by means of window 202, In 
tiie example considered here, the original signal is a voiced hybrid sound such that a cosine 
window in accordance with the following formula is used. 



15 



r (^2;r'(n + 0.5)'l 
Ti(»]=0-5'-0.5-cod ^ ^ L Q<.n<m 



In previous relation^ m is the length of the window and n is the running index. 
When the original signal is an unvoiced sound signal it is pref^ed to use the 
.20 , followixig window. 

The time domain of the signal to be synthesized is illustrated by time axis 204. 
The signal to be synthesized is required to have a duration of yT, where y can be any number, 
25 for example y= 4 or y = 6 cry = 20 or y = 50 or y = 100. 

The-period-p-does-also-deten2iiae4he4)itchJbellJ^ 



Like on time axis 200 the pitch bell locations are spaced apart by period p. For each of the 
required pitch bell locations j, a random selection of a location of a pitch bell i in the time 
domain of the time axis 200 is made. In the example coiwidered here there is a number of 6 
30 pitch bells which are obtained by windowing of the original signal in the time domain of time 
axis 200. To select one of these obtained pitch bells for a pitch bell location j a random 
numb^ be^veen 1 and 6 is generated. This way a raudom selection from the available pitch 
bells on pitch bell locations i » 1 to i = 6 is made. This process is repeated for all required 
pitch bell locations j on time axis 204. For example a pitch bell for the required pitch bell 
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location j - 1 i« seie(^hyg<sn0t(^^rBDd<miimmhahetwe6a 1 and 6. Ih the example 
considered here, the number 6 is obtained such tiiat the pitch beU obtained from pitch bell 
location i - 6 on the time axis 200 is selected for the requii^d pitch bell location j = 1 on the 
time axis 204. likewise a random nwnber is generated for the required pitch bell location j = 
5 2. The random nnmber is 4 in this example such that the pitch bell at pitch bell location i = 4 
on time axis 20O is selected for fte required pitch bell location j = 2. This process is 
peifonned for aU required pitch bell locations j ' 1 to j « « on time axis 204. Due to the 
random selection of the pitch bells from the domain of the origmal signal, intervals 106, 108, 
... are avoided (of Fig. 1). As a consequence no such artefact is introduced into the 
10 synthesized signal and the synthesized signal sounds naturally even for extreme duration 
manipulations. 

Fig. 3 shows a flow chart, which is illustrative of this method. In step 300 a 
recording of an original sound is provided, hi step 302 hybrid sound intervals are identified 
and classified as voiced or unvoiced in the original sound recording. This can be done 

15 manually by a human expert or by means of a computer program, which analyses the original 
signal and/or its frequency spectrum for steady periods. Pi«ferably the first analysis is 
performed by means of a program and a human ejcpert reviews the output of aprogram. In 
step 304 pitch bells are obtained from the original sound signal by means of windowing. 
Windoxving is perfonned by means of windows which are positioned synchronously with the 

20 fimdamental frequency of the original sound signal, i.e. the viindows are distanced by the 
period p of the original sound signal in the (tomain of the original sound signal hi st^ 306 
the pitch beU locations j for which pitch beUs are required m order to synthesize the signal 
are determined. Agam the required pittdi bell locations j are distanced by the period p. 
AltenKtively the pitch beU locations j can be distanced by another period q corresponding to 

25 a higher or lower required fimdamental frequMicy of the signal to be synthesizjed. This way 
the duration and the frequency can be modified. In step 308 a random selection of pitch beUs 
is made for each of the required pitch bell locations j within the sound uiterval which is 
classified as hybrid. For other soimd intervals a prior art PSOLA-type method may or may 
not be employed, in step 3 1 0 the pitch bells are overiapped and added on the pitch bell 

30 locations j in the domain of the signal to be synthesized. 

Kg. 4 shows an example of an origraal sound signal 400 which is a diphone of 
/z/ to /z/transition. Also the frequency spectrum 402 of the sound signal 400 is shown in 
Fig. 4. 
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Sound signal 404 i$ obtained fitnn soxmd signal 400 in accozdance with the 
present invention by randomly selecting pitob bells obtained from the sound dgnal 400 for 
the required pitch bell locations in the time domain of the synthesized sound signal 404. In 
the example considered here the synthesized sound signal 404 is y=6 times longer than the 
5 original sound signal 400. Also the frequency spectrum 406 of the sound signal 404 i$ shown 
in figure 4. As apparent irom the sound signal 404 and its frequency spectrum 406 the 
charact^istLCS of the original sound signal 400 are preserved in the synthesized signal and no 
artefacts s:pe introduced. As a consequence the sound signal 404 sounds Identical to the sound 
signal 400 but is 5 times longer. 

10 Fig. 5 shows a block diagram of a computer system, such as a text-to-speech 

synthesis system. The computer system 500 coniprises a module 502 for storing of an 
original sound signal. Module 504 serves to enter and store sound classification information 
for the original sound signal stored in module 502- For example, steady voiced periods are 
marked with an 'r' and steady unvoiced periods are marked with an 's' in the original sound 

IS signal. Module 506 serves for windowing of the original sound signal of module 502 in order 
to obtain pitch bells. Depending on the sound classification a raised cosine or a sine window 
is ixsed for steady voiced p^ods or steady unvoiced periods, respectively. Module 508 serves 
to determine the required pitch bell locations j in the time domain of the signal to be 
synthesi2:ed. In order to determine the required pitch bell locations j the input parameter 
. 20 . 'length y* is utilized, The input.parameter length y specifies the multiplication factcn: for the 
duration of the original signal. Further it is possible to provide a dynamically varying pitch as 
an additional input parameter to modify the fundamental frequency in addition to or instead 
of the duration. 

Module 5 10 serves to select pitch bells from the set of pitch bells obtained 
25 from the original sound signal- Module 510 is coupled to pseudo random number generator 

5-l-27^r-6acbt>ftiie-requifed-piteh-belHocation^^ 

synthesized, apseudo random number is generated by pseudo random number generator 512. 
By means of these random numbers selections of pitch bells from the set of pitch bells are 
made by module 510 in order to provide a randomly selected pitch bell for each of the 
30 required pitch bell locations in the time domain of the signal to be synthesized. Module 514 
serves to pes*fonn an overlap and add operation on the selected pitch bells in tho time domain 
of the signal to be synthesized, This way tlie synthesized sigiml having the required duration 
is obtained. 
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It is to be noted that tiie present invention can be applied on steady regions. 
For example, such a steady region can be a vowel or a noisy voiced sound like /z/. Hence, flie 
invention is not restricted to Tiybrid' sounds. 

Furthennore, it is to be noted that the synthesized signal does not need to have 
5 the same pitch (fundamental flrequency) as the original. In some applications it is requitod to 
change the pitch, for example in order to synthesl2;e singing. In order to accomplish this 
change of fundamental frequency in the synthesized signal, the period locations in tiie 
synthesized signal will he placed more closely or more away from each other than the 
• original. This does not otherwise change the synthesis procedure, 
10 Further it is to be noted that the present invention is not restricted to a certain 

choice of a window- Instead of raised cosine or sine windows other windows can be used 
such as triangular windows. 
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CLAIMS: 



10 



I , A method of synthegizing a first sound signal based on a second soimd signal^ 

tixe first so\md signal having a required first jBmdamental irequency and the second sound 
sigdai having a second ftmdamental frequency, tiie method comprising the steps of: 

deteimining of required pitch bell locations in the time domain of tiie first 
sonnd signal) the pitdi bell locations bebig distanced by one period of the first fbndamental 
frequency, 

providing of pitch bells by Tvindowng the second somd signal on pitch bdl 
locations in the thx^ domain of the second sonnd signal, the pitch bell locations being 
distanced by one period of the second fundamental frequency, 

randomly selecting of a pitch bell from the provided pitch bells fbr each of the 
required pitch bell tocations, 

performing an overlap and add operation on the selected pitch bells for 
synthesizing the first signal. 



15 2. The method of claim 1, whereby the second sound signal is a hybrid soxuid 

comprising a noisy and periodic component. 



20 



3. The method of claims 1 or 2, the second sound signal being a voiced fricative 

sound signals 



4. 



The method of any one of the preceding claims 1^ 2 or 3, the second sound 



signal being a voiced sound sigaal and whereby a raised cosine is used for windowing of flie 
second sound signal. 



25 



5. The methods of any one of the preceding claims U 2 or 3, the second sound 

signal being an unvoiced sound signs! and whereby a sine window is u^ed for windowing of 
Die second sound signal. 



PHNL020857EP1W 9 17.09.2002 10;4€ 

^ 17.09.2002 
6. The mefliod of anyone of the preceding claims 1 to 5, the secoiid sound signal 

having spectrally alike periods, the spectrally alikeperiods having basically the same 
iD£)nnation contrait 

5 7. The melhod of any one of the preceding claims 1 to 6, the required first 

fimdamental frequency and the second fundamental fiequesnoy being substantially the same. 

8. A computer program product, in particular digital storage medium, comprising 

program means for. synthesizing of a first sound signal based on a second-sound signal, the 
first sound signal having a required first fundamental frequency and the second sound signal 
having a second fimdamental frequency, the program means being adapted to perform tbs 
steps of: 

determining of required pitch bell locations in the time domain of the first 
sound signal, the pitch bell locations bemg distanced by one period of the first flmdamfintal 
15 ftiequewoy, 

providing of pitch bells by windowing (he second sound signal on piteh bell 
locations m the time domain of the second sound signal the pitch bell locations being 
distanced by one period of the second fimdamental ftequenoy, 

randomly selecting of a pitch bell fix>m ttie provided pitch bells fer eabh of the 
20 required piMi bell locations. 

performing an overlap and add operation on the selected pitch bells fbr 
synth^zing the first signal. 
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9. A ccaapvtes systeom, m particular texfr^to-speedh synthesis system, fbr 

25 synthesizing a first sound signal based on a second sound signal, the first sound signal having 

a required first fimdamental frequency and fiie second sound signal havmg a second 

fimdamental frequency, the computer syston comprising: 

means for detemrining of required pitch bell locations in the time domain of 

ths first sound signal, the pitch beU locations being distanced by one period of the first 
30 fundamental firequency, 

means for providmg of pitch bells by windowing the second sound signal on 
pitch bell locations m the time domam of tho second sound signal, the pitch beU locations 
being distanced by one period of the second flmdarawital firequency, 
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means for randomly seleoting of a pitch bell fix)m the provided pitch bells for 

each of the required pitch bell locations, 

means for perfonwing an overlap and add operation on the selected pitch bells 

for synthesizing the first signal. 

IQ, The computer system of claim 9 further comprising means for storing of somid 

classification data, the means for storing of sound classification data being adapted to store 
data being indicative of an interval containing the second sound signal within an original 
sound signal. ' " 



11. A synthesizing signal comprising a number of pitch bells which are 

overl^ped and added, each of the pitch bells being randomly selected fi:om a set of pitch 
bells which are obtained by windowing of an original soxmd signal on pitch bell locations in 
the time domain of the second sound signal, the pitdi beU locations being distanced by one 
IS period of the fundamental frequency. 
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ABSTEIACT: 



The present invention relates to amettiod of syathesizing a first sound signal 
baaed on a second sound signal, tbs jSrst sound signal bavine a required first fimdamental 
firequenoy and the second sound signal having a second fimdamentel firequenoy, the method 
comprising the steps o£ 

5 - ' detradning of lequireSl pitch beU locations in fte We domain of th© first ' 

sound signal, the pitch bell locations being distanced by one period of the first fundament 
frequency, 

IHoviding of pitch bells by windowing the second sound signal on piteh bell 
locations in the time domain of the second sound signal, the pitch bell locatians being 
10 distanced by one period of the second fimdamental frequency, 

randomly selecting of a pitidi bell fixwn the provided pitch bells for each of the 
required pitch bell locafions, 

perfiaming an overlap and add operation on the selected pitch bells for 
synthesizing the first signaL 

15 



fig. 3 
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